An Analysis of Radicals-based Features in Subjectivity Classification on Simplified Chinese Sentences
نویسندگان
چکیده
Chinese radicals are linguistic elements smaller than Chinese characters1. Normally, a radical is a semantic category and almost all characters contain radicals or are radicals themselves. In subjectivity classification on sentences, we can use radicals to represent characters, which reduce the scale of word space while keep the subjectivity information. In this paper, we manually labeled a character set to build a high-quality radical-character mapping, and then the mapping is used to generalize character-based features with radicals. In experiments, we at first evaluated the performance when directly generalizing characters with radicals, and then offer a hypothesis that can reduce noises. Experiments show that this approach based on our hypothesis can reduce feature space while keep or improve the performance, which is especially useful when the training samples are scarce. keyword: sentiment analysis, subjectivity classification, radical, Chinese character
منابع مشابه
MHSubLex: Using Metaheuristic Methods for Subjectivity Classification of Microblogs
In Web 2.0, people are free to share their experiences, views, and opinions. One of the problems that arises in web 2.0 is the sentiment analysis of texts produced by users in outlets such as Twitter. One of main the tasks of sentiment analysis is subjectivity classification. Our aim is to classify the subjectivity of Tweets. To this end, we create subjectivity lexicons in which the words into ...
متن کاملPredictive Features for Detecting Indefinite Polar Sentences
In recent years, text classification in sentiment analysis has mostly focused on two types of classification, the distinction between objective and subjective text, i.e. subjectivity detection, and the distinction between positive and negative subjective text, i.e. polarity classification. So far, there has been little work examining the distinction between definite polar subjectivity and indef...
متن کاملBeyond Topicality: Finding Opinionated Chinese Documents
The availability of Web 2.0 technologies has made it easy for information users to express their own opinions and access other people’s opinions on the Web. We are interested in understanding how opinions expressed in one way by one group compare to opinions expressed in another way by another group, especially in a different language. We have done reasonably well at finding opinionated English...
متن کاملPolitical Leaning Categorization by Exploring Subjectivities in Political Blogs
This paper addresses a relatively new text categorization problem: classifying a political blog as either ‘liberal’ or ‘conservative’, based on its political leaning. Instead of simply using “Bag of Words” features (BoW) as in previous work, we have explored subjectivity manifested in blogs and used subjectivity information thus found to help build political leaning classifiers. Specifically, o...
متن کاملClassifying Attitude by Topic Aspect for English and Chinese Document Collections
Title: Classifying Attitude by Topic Aspect for English and Chinese Document Collections Yejun Wu, Doctor of Philosophy, 2008 Dissertation directed by: Professor Douglas W. Oard College of Information Studies & Institute for Advanced Computer Studies, UMCP The goal of this dissertation is to explore the design of tools to help users make sense of subjective information in English and Chinese by...
متن کامل